{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "b01cd0d5",
      "metadata": {
        "id": "b01cd0d5"
      },
      "source": [
        "# Start using Chroma\n",
        "\n",
        "<div style=\"border:2px solid #f7e097; padding:10px; margin-top:20px; background-color:#fefcd5; border-radius: 5px;\">\n",
        "    🔑 <b>Note:</b> To generate proteins with Chroma, you'll need an API key from <a href=\"https://chroma-weights.generatebiomedicines.com\">chroma-weights.generatebiomedicines.com</a>. Execute the cell below and enter your key after accepting the license.\n",
        "</div>\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c6db90e2",
      "metadata": {
        "id": "c6db90e2"
      },
      "outputs": [],
      "source": [
        "import locale\n",
        "from google.colab import output\n",
        "output.enable_custom_widget_manager()\n",
        "locale.getpreferredencoding = lambda: \"UTF-8\"\n",
        "!pip install generate-chroma > /dev/null 2>&1"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f15f2198",
      "metadata": {
        "id": "f15f2198"
      },
      "outputs": [],
      "source": [
        "import torch\n",
        "from chroma import Chroma, Protein, conditioners, api\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "api.register_key(input(\"Enter API key: \"))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f46c2848",
      "metadata": {
        "id": "f46c2848"
      },
      "source": [
        "To generate protein samples with Chroma, initialize the model and call the sample method. The sample method generates a protein backbone, designs a sequence, and returns a `Protein` object."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "9728242e",
      "metadata": {
        "id": "9728242e"
      },
      "outputs": [],
      "source": [
        "# Initialize the Model\n",
        "chroma = Chroma()\n",
        "\n",
        "# Sample a Protein\n",
        "protein = chroma.sample()"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "95f44aa7",
      "metadata": {
        "id": "95f44aa7"
      },
      "source": [
        "The `Protein` object enables one line inspection, saving, and loading of proteins."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "cd620ec7",
      "metadata": {
        "id": "cd620ec7"
      },
      "outputs": [],
      "source": [
        "print(protein) # Inspect the sequence of the protein sample\n",
        "protein.to('chroma_sample.cif') # Save the sample to disk\n",
        "protein = Protein('chroma_sample.cif') # Load a protein from disk"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "93c6ea6b",
      "metadata": {
        "id": "93c6ea6b"
      },
      "outputs": [],
      "source": [
        "display(protein)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ab969fef",
      "metadata": {
        "id": "ab969fef"
      },
      "outputs": [],
      "source": [
        "# Calculate sample scores\n",
        "elbo = chroma.score(protein)['elbo'].score\n",
        "print(f'sample elbo: {elbo}')"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "06688f25",
      "metadata": {
        "id": "06688f25"
      },
      "source": [
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "1be29dbb",
      "metadata": {
        "id": "1be29dbb",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "# Conditioning"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "26d19c04",
      "metadata": {
        "id": "26d19c04"
      },
      "source": [
        "Chroma conditioners allow us to program proteins. In the following examples we will show conditional generation for `Infilling`, `Symmetry`, `Shape`, `Protein Classes`, and `Natural Language`."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "baf8ed65",
      "metadata": {
        "id": "baf8ed65",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "## Symmetry\n",
        "\n",
        "Chroma can generate symmetric proteins with the help of the symmetry conditioner. We demonstrate a minimal example of conditioning on the cyclic point group with a 7-fold rotation axis. This point group has 7 asymmetric units arranged in a circle. The subunits are of size 50 in this example. The following parameters can be adjusted below:\n",
        "\n",
        "* `SYMMETRY_GROUP`: symmetry group, choose from {'C_2', 'C_3', ..., \"D_2\", \"D_3\", ..., \"T\", \"O\", \"I\"}\n",
        "* `SUBUNIT_SIZES`: chain lengths for asymmetric unit: e.g [100], [100, 150], more than one chain is allowed for the asymmetric unit\n",
        "* `KNBR`: number of neighbors to pay attention to during sampling. max allowed is total number of asymetric units in the protein complex - 1.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4e060bd9",
      "metadata": {
        "id": "4e060bd9",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "SYMMETRY_GROUP = \"C_7\"\n",
        "SUBUNIT_SIZES = [100]\n",
        "KNBR = 2"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "87903f39",
      "metadata": {
        "id": "87903f39",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "# Draw a Sample\n",
        "torch.manual_seed(0)\n",
        "conditioner = conditioners.SymmetryConditioner(G=SYMMETRY_GROUP, num_chain_neighbors=KNBR)\n",
        "symmetric_protein = chroma.sample(\n",
        "    chain_lengths=SUBUNIT_SIZES,\n",
        "    conditioner=conditioner,\n",
        "    langevin_factor=8,\n",
        "    inverse_temperature=8,\n",
        "    sde_func=\"langevin\",\n",
        "    potts_symmetry_order=conditioner.potts_symmetry_order)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2f5d7eef",
      "metadata": {
        "id": "2f5d7eef",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "display(symmetric_protein)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "20bcee17",
      "metadata": {
        "id": "20bcee17",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "## Infilling\n",
        "\n",
        "Many protein design tasks including imputation of missing structural data, redesign of an enzyme scaffold given an active site, or redesign of the CDRs of a known antibody framework require exact specification of the known structural coordinates. The substructure conditioner enables this type of design. By specifiying the set of residues that are designable, and a protein to redesign, the user can perform infilling. In this example, a plane split is used which cuts a protein into two portions, a designable portion and a fixed portion. The following parameters can be set by the user:\n",
        "\n",
        "* `MASK_FRACTION`: the fraction of the protein to redesign.\n",
        "* `PDB_ID`: The pdb to use for a infilling. There are also a set of `TESTED_PDBS` that you can use as examples."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4f2478d5",
      "metadata": {
        "id": "4f2478d5"
      },
      "outputs": [],
      "source": [
        "TESTED_PDBS = ['3bdi', '5sv5','6qaz','2e0q','5xb0','6bde','1a8q','5o0t','1drf','1shg']\n",
        "MASK_PERCENT = 0.5 # Allow about 50% of the Protein to be designed\n",
        "PDB_ID = TESTED_PDBS[0]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "59e2d4b8",
      "metadata": {
        "id": "59e2d4b8"
      },
      "outputs": [],
      "source": [
        "# Configure Substructure Conditioner\n",
        "from chroma.utility.chroma import plane_split_protein\n",
        "protein = Protein(PDB_ID, canonicalize=True, device=device)\n",
        "\n",
        "X, C, _ = protein.to_XCS()\n",
        "residues_to_design = plane_split_protein(X, C, protein, 0.5).nonzero()[:,1].tolist()\n",
        "protein.sys.save_selection(gti=residues_to_design, selname=\"infilling_selection\")\n",
        "\n",
        "conditioner = conditioners.SubstructureConditioner(\n",
        "        protein,\n",
        "        backbone_model=chroma.backbone_network,\n",
        "        selection = 'namesel infilling_selection').to(device)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c03aeb4b",
      "metadata": {
        "id": "c03aeb4b"
      },
      "outputs": [],
      "source": [
        "# Draw a Sample\n",
        "torch.manual_seed(0)\n",
        "infilled_protein = chroma.sample(\n",
        "             protein_init=protein,\n",
        "             conditioner=conditioner,\n",
        "             langevin_factor=4.0,\n",
        "             langevin_isothermal=True,\n",
        "             inverse_temperature=8.0,\n",
        "             sde_func='langevin',\n",
        "             steps=500)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f2f8d76f",
      "metadata": {
        "id": "f2f8d76f"
      },
      "outputs": [],
      "source": [
        "display(infilled_protein)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "269d8f5e",
      "metadata": {
        "id": "269d8f5e",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "## Shape\n",
        "\n",
        "The shape conditioner enforces adherance to a predefined volumetric shape as represented by a point cloud. In the below example we use the Python Imaging Library to render a 3D point cloud from letters, and then we use the ShapeConditioner to sample backbones consistent with this point cloud. The user can set hyperparameters and vary the letter and the number of residues. For faster feedback, the number of steps has been decreased from that used in the manuscript. In this example both the choice of `LETTER` and the number of protein residues that fill the point cloud.\n",
        " * `LETTER`: a single character string containing the letter that will be made by the conditioner.\n",
        " * `NUM_RESIDUES`: the number of protein residues to fill the point cloud.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4736bb86",
      "metadata": {
        "id": "4736bb86"
      },
      "outputs": [],
      "source": [
        "LETTER = \"G\"\n",
        "NUM_RESIDUES = 1000"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2a2e109a",
      "metadata": {
        "id": "2a2e109a",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "# Configure Shape Conditioner\n",
        "from chroma.utility.chroma import letter_to_point_cloud\n",
        "letter_point_cloud = letter_to_point_cloud(LETTER)\n",
        "\n",
        "conditioner = conditioners.ShapeConditioner(\n",
        "        letter_point_cloud,\n",
        "        chroma.backbone_network.noise_schedule,\n",
        "        autoscale_num_residues=NUM_RESIDUES).to(device)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "adbe3e70",
      "metadata": {
        "id": "adbe3e70",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "# Draw a Sample\n",
        "torch.manual_seed(0)\n",
        "shaped_protein = chroma.sample(chain_lengths=[NUM_RESIDUES], conditioner=conditioner)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "111c8b74",
      "metadata": {
        "id": "111c8b74"
      },
      "outputs": [],
      "source": [
        "display(shaped_protein)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "27e1fcce",
      "metadata": {
        "id": "27e1fcce",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "## CATH class\n",
        "\n",
        "Proteins can be conditionally generated with specified folds according to CATH class annotations.  This conditioner uses the ProClass Model. Below we show a minimal example conditioning on generating a protein with mostly beta content.\n",
        "\n",
        "The ProClass Conditioner can set CATH class annotations at 3 levels.\n",
        "\n",
        "* `CATH_ANNOTATION`: `X`, e.g. `2` Selects a C level annotation, in this case \"Mostly Beta\"\n",
        "* `CATH_ANNOTATION`: `X.X`, e.g. `2.60` Selects a CA level annotation, in this case \"Sandwich\"\n",
        "* `CATH_ANNOTATION`: `X.X.X` e.g. `2.60.40` Selects a CAT level annotation, in this case \"Immunoglobulin-like\"\n",
        "\n",
        "In general C level annotations are most robust.  CA and CAT level annotations typically require many more samples to get good results. See the paper experiments for details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "48107807",
      "metadata": {
        "id": "48107807",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "CATH_ANNOTATION = '2.60.40'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "24d5e164",
      "metadata": {
        "id": "24d5e164",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "# Draw a Sample\n",
        "torch.manual_seed(0)\n",
        "conditioner = conditioners.ProClassConditioner('cath', CATH_ANNOTATION)\n",
        "cath_conditioned_protein = chroma.sample(conditioner=conditioner)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2fa3492b",
      "metadata": {
        "id": "2fa3492b",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "display(cath_conditioned_protein)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4f09cb2e",
      "metadata": {
        "id": "4f09cb2e",
        "pycharm": {
          "name": "#%% md\n"
        }
      },
      "source": [
        "## Natural language"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "af8201d4",
      "metadata": {
        "id": "af8201d4"
      },
      "source": [
        "Here, we demonstrate backbone generation conditioned on natural language prompts. The sampling is guided by the gradients of a structure to text model. To condition, we define a `ProCapConditioner` with the following inputs:\n",
        "- a caption\n",
        "- the chain ID, specifying the (1-indexed) caption refers to; captions corresponding to the entire protein can be indicated with `chain_id = -1`\n",
        "- the weight with which to use the conditioner\n",
        "\n",
        "Training was performed with individual chain captions drawn from UniProt, and complex-level captions taken from the PDB.\n",
        "\n",
        "Below, we demonstrate caption-guided sampling to obtain a single chain backbone corresponding to an SH2 domain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8d9dfce9",
      "metadata": {
        "id": "8d9dfce9",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "CAPTION = \"Crystal structure of SH2 domain\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "05e02265",
      "metadata": {
        "id": "05e02265",
        "pycharm": {
          "name": "#%%\n"
        }
      },
      "outputs": [],
      "source": [
        "# Draw a Sample\n",
        "torch.manual_seed(0)\n",
        "conditioner = conditioners.ProCapConditioner(CAPTION, -1)\n",
        "caption_conditioned_protein = chroma.sample(steps=200, chain_lengths=[110], conditioner=conditioner)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "bb43c861",
      "metadata": {
        "id": "bb43c861",
        "pycharm": {
          "name": "#%%\n"
        },
        "scrolled": true
      },
      "outputs": [],
      "source": [
        "display(caption_conditioned_protein)"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.7"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
